A computationally efficient approach to warp factor estimation in VTLN using EM algorithm and sufficient statistics
نویسندگان
چکیده
In this paper, we develop a computationally efficient approach for warp factor estimation in Vocal Tract Length Normalization (VTLN). Recently we have shown that warped features can be obtained by a linear transformation of the unwarped features. Using the warp matrices we show that warp factor estimation can be efficiently performed in an EM framework. This can be done by collecting Sufficient Statistics by aligning the unwarped utterances only once. The likelihood of warped features, which are necessary for warp factor estimation, are computed by appropriately modifying the sufficient statistics using the warp matrices. We show using OGI, TIDIGITS and RM task that this approach has recognition performance that is comparable to conventional VTLN and yet is computationally more efficient.
منابع مشابه
Using VTLN matrices for rapid and computationally-efficient speaker adaptation with robustness to first-pass transcription errors
In this paper, we propose to combine the rapid adaptation capability of conventional Vocal Tract Length Normalization (VTLN) with the computational efficiency of transform-based adaptation such as MLLR or CMLLR. VTLN requires the estimation of only one parameter and is, therefore, most suited for the cases where there is little adaptation data (i.e. rapid adaptation). In contrast, transform-bas...
متن کاملEfficient pitch-based estimation of VTLN warp factors
To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML) estimation, involving an exhaustive search over possible values. We describe an alternative approach: exploit the correlation between a speaker’s a...
متن کاملEfficient Pitch-based Estimation o
To reduce inter-speaker variability, vocal tract length normalization (VTLN) is commonly used to transform acoustic features for automatic speech recognition (ASR). The warp factors used in this process are usually derived by maximum likelihood (ML) estimation, involving an exhaustive search over possible values. We describe an alternative approach: exploit the correlation between a speaker’s a...
متن کاملUsing VTLN for broadcast news transcription
Vocal tract length normalisation (VTLN) is a commonly used speaker normalisation approach. It is attractive compared to many normalisation schemes as it is typically dependent on only a single parameter, allowing the warp factors to be robustly calculated on little data. However, the scheme normally requires explicitly coding the data at multiple warp factors. Furthermore, it is only possible t...
متن کاملFramework Of Feature Based Adaptation For Statistical Speech Synthesis And Recognition
The advent of statistical parametric speech synthesis has paved new ways to a unified framework for hidden Markov model (HMM) based text to speech synthesis (TTS) and automatic speech recognition (ASR). The techniques and advancements made in the field of ASR can now be adopted in the domain of synthesis. Speaker adaptation is a well-advanced topic in the area of ASR, where the adaptation data ...
متن کامل